probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...

, the continuous mapping theorem states that continuous functions preserve limits even if their arguments are sequences of random variables. A continuous function, in Heine’s definition, is such a function that maps convergent sequences into convergent sequences: if ''x_n'' → ''x'' then ''g''(''x_n'') → ''g''(''x''). The ''continuous mapping theorem'' states that this will also be true if we replace the deterministic sequence with a sequence of random variables , and replace the standard notion of convergence of real numbers “→” with one of the types of

convergence of random variables In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to ...

. This theorem was first proved by

Henry Mann Henry Berthold Mann (27 October 1905, Vienna – 1 February 2000, Tucson) was a professor of mathematics and statistics at the Ohio State University. Mann proved the Schnirelmann-Landau conjecture in number theory, and as a result earned the 1946 ...

and

Abraham Wald Abraham Wald (; hu, Wald Ábrahám, yi, אברהם וואַלד; – ) was a Jewish Hungarian mathematician who contributed to decision theory, geometry, and econometrics and founded the field of statistical sequential analysis. One of ...

in 1943, and it is therefore sometimes called the Mann–Wald theorem. Meanwhile,

Denis Sargan John Denis Sargan, FBA (23 August 1924 – 13 April 1996) was a British econometrician who specialized in the analysis of economic time-series. Sargan was born in Doncaster, Yorkshire in 1924, and was educated at Doncaster Grammar School and ...

refers to it as the general transformation theorem.

Statement

Let , ''X'' be

random element In probability theory, random element is a generalization of the concept of random variable to more complicated spaces than the simple real line. The concept was introduced by who commented that the “development of probability theory and expansi ...

s defined on a

metric space In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are the most general settin ...

''S''. Suppose a function (where ''S′'' is another metric space) has the set of discontinuity points ''D_g'' such that . Then :

X_n \ \xrightarrow\ X \quad & \Rightarrow\quad g(X_n)\ \xrightarrow\ g(X). \end

where the superscripts, "d", "p", and "a.s." denote

convergence in distribution In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications t ...

convergence in probability In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications t ...

, and

almost sure convergence In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to ...

respectively.

Proof

This proof has been adopted from

Spaces ''S'' and ''S′'' are equipped with certain metrics. For simplicity we will denote both of these metrics using the , ''x'' − ''y'', notation, even though the metrics may be arbitrary and not necessarily Euclidean.

Convergence in distribution

We will need a particular statement from the

portmanteau theorem In mathematics, more specifically measure theory, there are various notions of the convergence of measures. For an intuitive general sense of what is meant by ''convergence of measures'', consider a sequence of measures μ''n'' on a space, sharing ...

: that convergence in distribution

X_n\xrightarrowX

is equivalent to :

\mathbb E f(X_n) \to \mathbb E f(X)

for every bounded continuous functional ''f''. So it suffices to prove that

\mathbb E f(g(X_n)) \to \mathbb E f(g(X))

for every bounded continuous functional ''f''. Note that

F = f \circ g

is itself a bounded continuous functional. And so the claim follows from the statement above.

Convergence in probability

Fix an arbitrary ''ε'' > 0. Then for any ''δ'' > 0 consider the set ''B_δ'' defined as :

B_\delta = \big\.

This is the set of continuity points ''x'' of the function ''g''(·) for which it is possible to find, within the ''δ''-neighborhood of ''x'', a point which maps outside the ''ε''-neighborhood of ''g''(''x''). By definition of continuity, this set shrinks as ''δ'' goes to zero, so that lim_{''δ'' → 0}''B_δ'' = ∅. Now suppose that , ''g''(''X'') − ''g''(''X_n''), > ''ε''. This implies that at least one of the following is true: either , ''X''−''X_n'', ≥ ''δ'', or ''X'' ∈ ''D_g'', or ''X''∈''B_δ''. In terms of probabilities this can be written as :

\Pr\big(\big, g(X_n)-g(X)\big, >\varepsilon\big) \leq
    \Pr\big(, X_n-X, \geq\delta\big) + \Pr(X\in B_\delta) + \Pr(X\in D_g).

On the right-hand side, the first term converges to zero as ''n'' → ∞ for any fixed ''δ'', by the definition of convergence in probability of the sequence . The second term converges to zero as ''δ'' → 0, since the set ''B_δ'' shrinks to an empty set. And the last term is identically equal to zero by assumption of the theorem. Therefore, the conclusion is that :

\lim_\Pr \big(\big, g(X_n)-g(X)\big, >\varepsilon\big) = 0,

which means that ''g''(''X_n'') converges to ''g''(''X'') in probability.

Almost sure convergence

By definition of the continuity of the function ''g''(·), :

\lim_X_n(\omega) = X(\omega) \quad\Rightarrow\quad \lim_g(X_n(\omega)) = g(X(\omega))

at each point ''X''(''ω'') where ''g''(·) is continuous. Therefore, :

\begin
  \Pr\left(\lim_g(X_n) = g(X)\right)
  &\geq \Pr\left(\lim_g(X_n) = g(X),\ X\notin D_g\right) \\
  &\geq \Pr\left(\lim_X_n = X,\ X\notin D_g\right)  = 1,
  \end

because the intersection of two almost sure events is almost sure. By definition, we conclude that ''g''(''X_n'') converges to ''g''(''X'') almost surely.

References